Environment Setting
# Import required packages
library(tidytransit)
library(tidyverse)
library(tmap)
library(ggplot2)
library(gtfsrouter)
library(here)
library(units)
library(sf)
library(leaflet)
library(tidycensus)
library(plotly)
library(igraph)
library(tidygraph)
library(dodgr)
source("https://raw.githubusercontent.com/BonwooKoo/gtfs_to_igraph/master/gtfs_to_igraph.R")
wd <- file.path(Sys.getenv('setwd'),"work/working/School/UA_2022/external/Lab/module_2")
setwd(eval(wd))
What is the General Transit Feed Specification (GTFS)?
Before GTFS, there wasn’t (to the best of my knowledge, at least) a standardized format for transit timetables other associated information. Users of transit data across multiple transit agencies had to deal with different data formats. With GTFS, feeds from different agencies became standardized (although not perfect), making applications of the data easy.
Useful Links:
- tidytransit (i.e., R package) Vignettes
- Google GTFS
- Open Mobility Data - Archive of GTFS feeds from around the world
Let’s download some GTFS feed provided by Metropolitan Atlanta Rapid Transit Authority (MARTA).
## Directly from URL into R
# atl <- tidytransit::read_gtfs("http://www.itsmarta.com/google_transit_feed/google_transit.zip")
atl <- read_gtfs('MARTA_GTFS_Latest_Feed.zip')
# Take a look
summary(atl)
Note that the code above downloaded the GTFS feed from an URL directly into your R environment. If you want to get a stable copy for your self, you can also save the GTFS feed into your hard drive first and then read it into R. See below.
# NOT RUN: From URL to hard drive, then into R
download_path <- file.path(getwd(), "GTFS_MARTA.zip")
download.file("http://www.itsmarta.com/google_transit_feed/google_transit.zip", destfile = download_path)
atl <- tidytransit::read_gtfs(download_path)
GTFS feed contains many relational tables about transit service schedules, trips, stops, and routes.
Where to get the URL?
Here are some other sources of GTFS feeds:
Understand GTFS
GTFS consists of multiple tables and comes in a zip file as a single package. Transit is a complex system that contains multiple components (e.g., routes, stops, service schedules) working together. The table below shows a brief description of what each data.frame contains. This table is taken from Google.
Description of tables in GTFS feed Table name
Defines ———————- ———————
agency Transit agencies with service represented in this dataset.
stops Stops where vehicles pick up or drop off riders. Also defines
stations and station entrances. routes Transit routes. A route is a
group of trips that are displayed to riders as a single service. trips
Trips for each route. A trip is a sequence of two or more stops that
occur during a specific time period. stop_times Times that a vehicle
arrives at and departs from stops for each trip. calendar Service dates
specified using a weekly schedule with start and end dates. This file is
required unless all dates of service are defined in calendar_dates.txt.
calendar_dates Exceptions for the services defined in the calendar.txt.
If calendar.txt is omitted, then calendar_dates.txt is required and must
contain all dates of service. fare_attributes Fare information for a
transit agency’s routes. fare_rules Rules to apply fares for
itineraries. shapes Rules for mapping vehicle travel paths, sometimes
referred to as route alignments. frequencies Headway (time between
trips) for headway-based service or a compressed representation of
fixed-schedule service. transfer Rules for making connections at
transfer points between routes. pathways Pathways linking together
locations within stations. levels Levels within stations. feed_into
Dataset metadata, including publisher, version, and expiration
information. translations Translated information of a transit agency.
attributions Specifies the attributions that are applied to the dataset.
———————- ———————
These tables are relational table that are connected through a system of join keys. The schematic below shows which tables are linked to which tables, through which join keys. Understanding this structure is essential in using GTFS.
IMAGE SOURCE: http://tidytransit.r-transit.org/articles/introduction.html
what’s inside atl object
Now, let’s take a look the atl object in which we read
the GTFA feed from MARTA. This object is a list. In it,
names(atl) shows that there are 9 data.frames. Notice that
there are many tables displayed in the table above, but
typeof(atl)
## [1] "list"
names(atl)
## [1] "agency" "calendar" "calendar_dates" "routes"
## [5] "shapes" "stop_times" "stops" "trips"
## [9] "."
# head(atl$calendar)
GTFS into geospatial format
The function gtfs_as_sf converts ‘shapes’ and ‘stops’
tables in GTFS data into sf objects.
atlsf <- tidytransit::gtfs_as_sf(atl, crs = 4326)
head(atlsf)
## $agency
## # A tibble: 1 × 7
## agency_id agency_name agenc…¹ agenc…² agenc…³ agenc…⁴ agenc…⁵
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 MARTA Metropolitan Atlanta Rapid … http:/… Americ… en (404)8… custse…
## # … with abbreviated variable names ¹agency_url, ²agency_timezone,
## # ³agency_lang, ⁴agency_phone, ⁵agency_email
##
## $calendar
## # A tibble: 4 × 10
## service_id monday tuesday wednesday thursday friday saturday sunday start_date
## <chr> <int> <int> <int> <int> <int> <int> <int> <date>
## 1 2 0 0 0 0 0 0 0 2021-08-14
## 2 3 0 0 0 0 0 1 0 2021-08-14
## 3 4 0 0 0 0 0 0 1 2021-08-14
## 4 5 1 1 1 1 1 0 0 2021-08-14
## # … with 1 more variable: end_date <date>
## # ℹ Use `colnames()` to see all variable names
##
## $calendar_dates
## # A tibble: 3 × 3
## service_id date exception_type
## <chr> <date> <int>
## 1 5 2021-09-06 2
## 2 5 2021-11-25 2
## 3 5 2021-11-26 2
##
## $routes
## # A tibble: 118 × 8
## route_id route_short_name route_lon…¹ route…² route…³ route…⁴ route…⁵ route…⁶
## <chr> <chr> <chr> <chr> <int> <chr> <chr> <chr>
## 1 15664 1 Marietta B… "" 3 https:… FF00FF ""
## 2 15665 2 Ponce de L… "" 3 https:… 008000 ""
## 3 15666 3 Martin Lut… "" 3 https:… FF8000 ""
## 4 15667 4 Moreland A… "" 3 https:… FF00FF ""
## 5 15668 5 Piedmont R… "" 3 https:… 00FFFF ""
## 6 15669 6 Clifton Ro… "" 3 https:… 008080 ""
## 7 15670 8 North Drui… "" 3 https:… 008000 ""
## 8 15671 9 Boulevard … "" 3 https:… 00FFFF ""
## 9 15672 12 Howell Mil… "" 3 https:… FF00FF ""
## 10 15673 14 14th Stree… "" 3 https:… 00FF00 ""
## # … with 108 more rows, and abbreviated variable names ¹route_long_name,
## # ²route_desc, ³route_type, ⁴route_url, ⁵route_color, ⁶route_text_color
## # ℹ Use `print(n = ...)` to see more rows
##
## $shapes
## Simple feature collection with 394 features and 1 field
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: -84.67085 ymin: 33.4323 xmax: -84.08274 ymax: 34.10663
## Geodetic CRS: WGS 84
## First 10 features:
## shape_id geometry
## 1 119402 LINESTRING (-84.55227 33.77...
## 2 119403 LINESTRING (-84.3694 33.821...
## 3 119404 LINESTRING (-84.39192 33.75...
## 4 119405 LINESTRING (-84.50486 33.73...
## 5 119406 LINESTRING (-84.21014 33.85...
## 6 119407 LINESTRING (-84.30635 33.88...
## 7 119408 LINESTRING (-84.33644 33.95...
## 8 119409 LINESTRING (-84.44857 33.65...
## 9 119410 LINESTRING (-84.35409 33.75...
## 10 119411 LINESTRING (-84.35409 33.75...
##
## $stop_times
## # A tibble: 1,033,990 × 5
## trip_id arrival_time departure_time stop_id stop_sequence
## <chr> <time> <time> <chr> <int>
## 1 8044645 14:35:00 14:35:00 903320 1
## 2 8044645 14:36:02 14:36:02 903448 2
## 3 8044645 14:36:51 14:36:51 901144 3
## 4 8044645 14:37:43 14:37:43 904219 4
## 5 8044645 14:38:19 14:38:19 903850 5
## 6 8044645 14:38:39 14:38:39 903664 6
## 7 8044645 14:39:39 14:39:39 213268 7
## 8 8044645 14:40:12 14:40:12 213101 8
## 9 8044645 14:41:35 14:41:35 212926 9
## 10 8044645 14:41:53 14:41:53 211886 10
## # … with 1,033,980 more rows
## # ℹ Use `print(n = ...)` to see more rows
# Interactive mapping
tmap::tmap_mode('view')
## tmap mode set to interactive viewing
m1 <- tmap::tm_shape(atlsf$shapes) + tmap::tm_lines(alpha = 0.5)
m2 <- tmap::tm_shape(atlsf$stops) + tmap::tm_dots(id = 'stop_name', alpha = 0.5)
tmap::tmap_arrange(m1, m2, sync = T)